Fix/realtime tts voice rewire#181
Merged
rmittal-github merged 10 commits intoMay 26, 2026
Merged
Conversation
Defer the pyaudio import to the points where it is actually needed (MicrophoneStream.__enter__, SoundCallBack.__init__, list_*_devices, get_*_info). Default WAV-output flows now work on machines without PortAudio headers installed. When pyaudio is missing, raise an ImportError that explicitly tells the user to install portaudio19-dev first, addressing the VDR finding that fresh-box users got blocked by a bare ModuleNotFoundError with no install instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The riva-asr/nmt/tts client scripts historically exit 0 on most error paths — including "Unavailable model", connection refused, empty/invalid input, and missing files — which causes CI pipelines composing these scripts via && chains to silently swallow real failures. Add a cli_main decorator that translates uncaught exceptions into a small, consistent set of exit codes: 2 = bad input (missing/empty file, ValueError, IsADirectoryError) 3 = gRPC UNAVAILABLE (server down, wrong port, network) 4 = gRPC INVALID_ARGUMENT / NOT_FOUND (bad model/lang/voice) 1 = anything else 130 = SIGINT The decorator also writes the error to stderr so CI logs surface the cause rather than the script swallowing it. Follow-up commit wires this into each client script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation
Address the VDR 26.02 finding that python-clients CLIs exit 0 on most
error paths across all three modalities. Each script now:
- Wraps main() with @cli_main so gRPC and OS errors propagate to a
real exit code instead of being printed and swallowed.
- Calls sys.exit(main()) so the chosen exit code reaches the shell.
Script-specific fixes:
scripts/nmt/nmt.py
- Drop the inner request() try/except that swallowed every gRPC
status; let cli_main translate it. Empty/whitespace --text and
missing --text-file now return EXIT_BAD_INPUT (was: silent
exit 0). Document --max-len-variation as decoder-token units
with valid range [0, 256], default 20, and Arabic chunking note.
scripts/tts/talk.py
- Reject whitespace-only --text up front (defense-in-depth pair to
the server-side fix in riva-speech that closed the hang on
`--text " "`). Drop the broad `except Exception` that
stringified gRPC errors and exited 0.
scripts/asr/transcribe_file*.py
- Replace `print(...); return` on missing input files with
EXIT_BAD_INPUT. Remove the silent grpc.RpcError swallow in
transcribe_file_offline.py.
scripts/asr/transcribe_mic.py + realtime_asr_client.py + tts/talk.py
- Pyaudio install hint now mentions `apt-get install -y
portaudio19-dev` (Debian/Ubuntu) and `brew install portaudio`
(macOS), pairing with the prereqs doc landed in documentation_2.
scripts/tts/realtime_tts_client.py
- Drop the module-level `from riva.client.audio_io import
SoundCallBack` import (it was unused and pulled pyaudio in
eagerly, defeating the lazy import). Drop the broad
`except Exception` that mapped every failure to exit 1.
scripts/nmt/nmt_speech_to_{text,speech}.py
- Drop unused `import grpc`; remove the catch-all that printed
"Error during translation" and exited 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VDR 26.02 found that realtime_tts_client.py silently ignored --voice and fell back to the server default (Mia). Tracing the WebSocket flow, the synthesize_session.update payload was built by deep-mutating the response from POST /v1/realtime/synthesis_sessions — an InitialSynthesisSessionConfig that carries id/object/client_secret fields not present in BaseSynthesisSessionConfig (the type the server validates the update against). Carrying those keys through to the override, plus the shallow .copy() + _safe_update_config nested-dict mutation, was the path that let the voice_name override fail to land on published 26.02 NIMs. Build the update payload explicitly from CLI args instead, so only fields the user actually overrode reach the server, in the exact shape documented in the SynthesisSessionUpdateMessage schema. Bump the override summary to INFO so users can see which fields were sent. After the synthesize_session.updated response, compare the server-applied voice_name and language_code against what was requested and log a WARNING on mismatch — defense-in-depth so any future server-side drop surfaces in the client log instead of as a wrong-sounding audio file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only import parse_custom_configuration and pass custom_configuration to synthesize/synthesize_online when --custom-configuration is supplied, so talk.py keeps working against older riva-client wheels that lack the function and the kwarg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cli_main and EXIT_BAD_INPUT were added recently in argparse_utils and are not present in older riva-client wheels. Wrap their imports in a try/except across all asr/nmt/tts client scripts, falling back to a no-op decorator and EXIT_BAD_INPUT=2 so the scripts keep running against older installed wheels (only the structured exit codes are lost in that case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines
Aligns scripts/tts/talk.py and riva.client.SpeechSynthesisService synthesize/synthesize_online defaults with the HTTP /v1/audio/synthesize default, so the same call over either transport yields the same audio when the rate is left unspecified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rmittal-github
approved these changes
May 26, 2026
rmittal-github
pushed a commit
that referenced
this pull request
May 26, 2026
* Make pyaudio an optional dependency in audio_io
Defer the pyaudio import to the points where it is actually needed
(MicrophoneStream.__enter__, SoundCallBack.__init__, list_*_devices,
get_*_info). Default WAV-output flows now work on machines without
PortAudio headers installed. When pyaudio is missing, raise an
ImportError that explicitly tells the user to install portaudio19-dev
first, addressing the VDR finding that fresh-box users got blocked by
a bare ModuleNotFoundError with no install instructions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add cli_main decorator with structured CLI exit codes
The riva-asr/nmt/tts client scripts historically exit 0 on most error
paths — including "Unavailable model", connection refused, empty/invalid
input, and missing files — which causes CI pipelines composing these
scripts via && chains to silently swallow real failures.
Add a cli_main decorator that translates uncaught exceptions into a
small, consistent set of exit codes:
2 = bad input (missing/empty file, ValueError, IsADirectoryError)
3 = gRPC UNAVAILABLE (server down, wrong port, network)
4 = gRPC INVALID_ARGUMENT / NOT_FOUND (bad model/lang/voice)
1 = anything else
130 = SIGINT
The decorator also writes the error to stderr so CI logs surface the
cause rather than the script swallowing it. Follow-up commit wires
this into each client script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Wire cli_main into asr/nmt/tts client scripts and tighten input validation
Address the VDR 26.02 finding that python-clients CLIs exit 0 on most
error paths across all three modalities. Each script now:
- Wraps main() with @cli_main so gRPC and OS errors propagate to a
real exit code instead of being printed and swallowed.
- Calls sys.exit(main()) so the chosen exit code reaches the shell.
Script-specific fixes:
scripts/nmt/nmt.py
- Drop the inner request() try/except that swallowed every gRPC
status; let cli_main translate it. Empty/whitespace --text and
missing --text-file now return EXIT_BAD_INPUT (was: silent
exit 0). Document --max-len-variation as decoder-token units
with valid range [0, 256], default 20, and Arabic chunking note.
scripts/tts/talk.py
- Reject whitespace-only --text up front (defense-in-depth pair to
the server-side fix in riva-speech that closed the hang on
`--text " "`). Drop the broad `except Exception` that
stringified gRPC errors and exited 0.
scripts/asr/transcribe_file*.py
- Replace `print(...); return` on missing input files with
EXIT_BAD_INPUT. Remove the silent grpc.RpcError swallow in
transcribe_file_offline.py.
scripts/asr/transcribe_mic.py + realtime_asr_client.py + tts/talk.py
- Pyaudio install hint now mentions `apt-get install -y
portaudio19-dev` (Debian/Ubuntu) and `brew install portaudio`
(macOS), pairing with the prereqs doc landed in documentation_2.
scripts/tts/realtime_tts_client.py
- Drop the module-level `from riva.client.audio_io import
SoundCallBack` import (it was unused and pulled pyaudio in
eagerly, defeating the lazy import). Drop the broad
`except Exception` that mapped every failure to exit 1.
scripts/nmt/nmt_speech_to_{text,speech}.py
- Drop unused `import grpc`; remove the catch-all that printed
"Error during translation" and exited 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Send override-only payload for realtime TTS session update
VDR 26.02 found that realtime_tts_client.py silently ignored --voice and
fell back to the server default (Mia). Tracing the WebSocket flow, the
synthesize_session.update payload was built by deep-mutating the response
from POST /v1/realtime/synthesis_sessions — an InitialSynthesisSessionConfig
that carries id/object/client_secret fields not present in
BaseSynthesisSessionConfig (the type the server validates the update
against). Carrying those keys through to the override, plus the shallow
.copy() + _safe_update_config nested-dict mutation, was the path that let
the voice_name override fail to land on published 26.02 NIMs.
Build the update payload explicitly from CLI args instead, so only fields
the user actually overrode reach the server, in the exact shape documented
in the SynthesisSessionUpdateMessage schema. Bump the override summary to
INFO so users can see which fields were sent. After the
synthesize_session.updated response, compare the server-applied voice_name
and language_code against what was requested and log a WARNING on
mismatch — defense-in-depth so any future server-side drop surfaces in the
client log instead of as a wrong-sounding audio file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Guard TTS custom_configuration usage for backwards compatibility
Only import parse_custom_configuration and pass custom_configuration to
synthesize/synthesize_online when --custom-configuration is supplied,
so talk.py keeps working against older riva-client wheels that lack
the function and the kwarg.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Guard cli_main/EXIT_BAD_INPUT imports for backwards compatibility
cli_main and EXIT_BAD_INPUT were added recently in argparse_utils and
are not present in older riva-client wheels. Wrap their imports in a
try/except across all asr/nmt/tts client scripts, falling back to a
no-op decorator and EXIT_BAD_INPUT=2 so the scripts keep running
against older installed wheels (only the structured exit codes are
lost in that case).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines
* Default TTS sample rate to 22050 Hz to match HTTP API
Aligns scripts/tts/talk.py and riva.client.SpeechSynthesisService
synthesize/synthesize_online defaults with the HTTP /v1/audio/synthesize
default, so the same call over either transport yields the same audio
when the rate is left unspecified.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Addressing review comments
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Yuvaraj Dharavath <ydharavath@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.